Agent Tools Developer Guide
This guide covers how to add a new agent tool, how artifacts are laid out on disk, and how caching works for agent runs.
Operational Model
Annolid agent operations are split into two layers:
Self-improving: skills and memory evolve behavior without replacing installed code.
Self-updating: signed update workflow stages and applies software updates with rollback plans.
Self-improving
Skills: loaded with precedence
workspace -> managed (~/.annolid/skills) -> bundled.Hot reload: controlled by
skills.load.watchandskills.load.pollSeconds.Skill manifest validation: frontmatter is validated at load time; invalid manifests are marked unavailable.
Workspace memory: daily notes in
memory/YYYY-MM-DD.mdand curated long-term notes inmemory/MEMORY.md.Pre-compaction flush: transcript snapshot can be appended before compaction via memory flush helpers.
Memory retrieval plugin: default is local semantic ranking with keyword fallback (
workspace_semantic_keyword_v1).
Self-updating
Channel-aware update manager supports
stable,beta, anddev.Pipeline:
preflight -> stage -> verify -> apply -> restart marker -> post-check.Rollback: rollback plan is generated for each run and executed on apply/post-check failures.
Canary policy: rollout can enforce rollback thresholds using sample count, failure-rate, and regression limits.
Safe update service: supports manifest check, artifact staging/download, checksum verification, signature verification, and transaction reporting.
Auto-update: disabled by default; configurable interval+jitter schedule when enabled (
ANNOLID_AUTO_UPDATE_*env settings).GUI controls:
AI Model Settings -> Agent Runtimeincludes auto-update enable/channel/check-now/rollback and bot settings for skill hot reload, memory mode, and skill source locations.Production safety policy: in production mode (
ANNOLID_PRODUCTION_MODE=1orANNOLID_ENV=production), signed update manifests and signed non-builtin skills are required.
How to add a tool
Define the tool by extending the base class in
annolid/core/agent/tools/base.py:Implement
run(self, ctx, payload)with your core logic.Use
ctx.results_dirandctx.run_idto derive stable outputs.Use
ctx.artifact_storeif you want to persist artifacts and participate in caching.
Register the tool in the registry:
Add a new tool wrapper in
annolid/core/agent/tools/.Export it from
annolid/core/agent/tools/__init__.py.Register it with
ToolRegistry(seeannolid/core/agent/tools/registry.py).
Integrate with the runner (Phase 4+):
Compose tools using the registry and a pipeline definition.
Ensure inputs/outputs follow the unified data models in
base.py.
Write a minimal test:
Use tiny inputs and validate outputs.
Prefer tests under
tests/that don’t require large external models.
Artifact layout
Artifacts are stored per video results directory and organized as:
<results_dir>/agent.ndjson(default agent output)<video_name>_000000000.json+ per-frame LabelMe JSON.agent_runs/<run_id>/(run-scoped artifacts).cache/agent_cache.json(cache metadata for re-run reuse)
The FileArtifactStore resolves paths relative to:
Run artifacts:
.agent_runs/<run_id>/...Cache artifacts:
.cache/...
See annolid/core/agent/tools/artifacts.py for helpers.
Caching semantics
Agent runs compute a content hash from:
video path + filesystem stats (size/mtime),
behavior spec (full schema),
run config (stride, max frames, etc.),
model identifiers,
output NDJSON name.
If the cache hash matches and both the NDJSON and annotation store exist, the service returns cached results without re-running the agent.
To disable reuse from the CLI, run:
annolid-run agent --no-cache ...
Citation management tools
Annolid includes built-in BibTeX tooling for paper citation workflows:
CLI:
annolid-run citations-list --bib-file refs.bib [--query ...]annolid-run citations-upsert --bib-file refs.bib --key mykey --title ... --author ... --year ...annolid-run citations-remove --bib-file refs.bib --key mykeyannolid-run citations-format --bib-file refs.bib
Agent function tools:
bibtex_list_entriesbibtex_upsert_entrybibtex_remove_entrygui_save_citation(save from active PDF/web viewer context)
Examples in Annolid Bot message input:
save citationlist citationslist citations from references.bib for annolidsave citation from pdf as annolid2024 to references.bibsave citation from webadd citation @article{yang2024annolid, title={Annolid: Annotate, Segment, and Track Anything You Need}, author={Yang, Chen and Cleland, Thomas A}, journal={arXiv preprint arXiv:2403.18690}, year={2024}}save citation from web with strict validationsave citation from pdf without validationopen threejs example two miceopen threejs example brainopen threejs html /tmp/annolid_threejs_examples/two_mice.htmlopen threejs https://example.org/viewer.html
Default behavior:
save citationfirst attempts Google Scholar BibTeX lookup from the active paper context, then falls back to Crossref/OpenAlex when needed, and saves the merged entry to.bib.
GUI workflow:
In Annolid Bot input toolbar, click
📚to open the citation manager.Manage a
.bibfile, save citations from active PDF/web context, choose auto-validation or strict mode, view/edit aSourcecolumn (URL or PDF path), edit rows inline with year/DOI checks, and remove selected entries.
See also: docs/source/citations_tutorial.md for a full user tutorial.
Operator Commands
Use annolid-run commands for routine operations:
annolid-run agent skills refresh [--workspace <path>]annolid-run agent skills inspect [--workspace <path>]annolid-run agent memory flush [--workspace <path>] [--session-id <id>] [--note <text>]annolid-run agent memory inspect [--workspace <path>]annolid-run agent eval run --traces <jsonl> --candidate-responses <jsonl> --out <report.json>annolid-run agent eval build-regression --workspace <path> --out <traces.jsonl> [--min-abs-rating 1]annolid-run agent eval gate --changed-files <files.txt> --report <report.json> [--max-regressions 0] [--min-pass-rate 0.0]annolid-run agent feedback add --workspace <path> --rating -1|0|1 [--trace-id <id>] [--comment <text>] [--expected-substring <text>]annolid-run update check --channel stable|beta|dev [--require-signature]annolid-run update run --channel stable|beta|dev [--execute] [--require-signature] [--skip-post-check] [--canary-metrics <json>]annolid-run update rollback --install-mode package|source --previous-version <X.Y.Z> [--execute]
Admin Function APIs
The agent runtime also exposes operator-style function tools:
skills.refreshmemory.flusheval.runupdate.runupdate.runrequires explicit operator consent phrase forexecute=true:APPROVE_ANNOLID_CORE_UPDATE(override withANNOLID_OPERATOR_UPDATE_CONSENT_PHRASE).
Shell Session Tools
For OpenClaw-style shell lifecycle workflows, Annolid now provides session tools:
exec_start(command, working_dir?, background?, timeout_s?, pty?)exec_process(action, session_id?, wait_ms?, tail_lines?, text?, submit?)
Supported exec_process.action values:
list,poll,log,write,submit,kill
Notes:
ptyis accepted but currently not enabled (pty_supported=falsein responses).Basic dangerous command patterns are blocked at start time.
Runtime policy group
group:runtimenow includesexec,exec_start, andexec_process.
Improvement Quality Loop
Anonymized run traces:
workspace/eval/run_traces.ndjsoncaptures hashed session/channel/chat IDs and redacted text previews.Explicit user feedback:
workspace/eval/feedback.ndjsonstores rating/comment/optional expected substring for promotion signals.Regression dataset build: combines traces + feedback into eval traces for CI and pre-promotion checks.
Shadow mode: enable
ANNOLID_AGENT_SHADOW_MODE=1to log alternative routing decisions toworkspace/eval/shadow_routing.ndjson. useannolid-run agent skills shadow --candidate-pack <dir>to compare candidate skill packs before promotion.
Governance and Audit
Governance events are stored as NDJSON with default path:
~/.annolid/governance/events.ndjson
You can override it with:
ANNOLID_GOVERNANCE_EVENTS_PATH=/custom/path/events.ndjson
Audited event categories include skill snapshot/refresh changes, memory writes/flushes, update stage/run actions, and rollback outcomes.
Three.js bot tools
Annolid Bot supports direct Three.js viewer control in GUI sessions.
Function tools:
gui_open_threejs(path_or_url)gui_open_threejs_example(example_id)
Built-in example IDs:
two_mice_html(default)brain_viewer_htmlhelix_points_csvwave_surface_objsphere_points_ply
The bot recognizes natural-language commands such as open threejs example ....
Browser Automation Safety
Annolid supports MCP browser automation with both granular tools and a unified tool:
mcp_browser(single control surface with actions:status|start|stop|navigate|snapshot|screenshot|act|wait)mcp_browser_navigate,mcp_browser_click,mcp_browser_type, etc.
Navigation hardening:
browser navigation allows
http://,https://, andabout:blankonly.unsafe schemes such as
file://,javascript:, anddata:are blocked.GUI
open_urlalso blocksfile://; use an explicit local file path instead.
Annolid code/docs Q&A and tutorials
Annolid Bot is optimized to answer Annolid-specific questions from local docs and code context.
It can explain modules, workflows, and settings with file-path references.
It can generate on-demand tutorials for requested topics and levels using the active chat model, grounded by Annolid docs/code evidence.
When a tutorial is saved to Markdown, Annolid Bot auto-opens the generated
.mdin the embedded web viewer.Direct command examples:
create on demand tutorial for realtime camera setup in annolidcreate beginner tutorial for behavior analysis and save to markdown filehow do i use annolid for behavior analysis
Realtime camera snapshot + email
Annolid Bot can capture a snapshot from a camera stream and send it by email.
Stream snapshot:
GUI sessions: use
gui_check_stream_sourcewithsave_snapshot=true.This GUI tool now runs a full camera mission pipeline:
probe -> capture -> annotate -> notify/emailreturns explicit
camera_mission.stepsanddeliverystatus objects.
Non-GUI channels (for example email/IM): use
camera_snapshot.Snapshot files are saved under
.annolid/workspace/camera_snapshots/.Outlook Safe Links camera URLs are automatically unwrapped to the original stream URL.
Source fallback policy is intent-aware:
eye-blink intent defaults to camera
0network camera intent prefers remembered network streams.
Email with attachments:
Use the
emailtool with:tosubjectcontentoptional
attachment_paths(list of local file paths)
Example bot intent:
check wireless camera, save a snapshot, and email it to user@example.com
Realtime email/report spam control:
Realtime bot report interval controls report cadence.
Email requests use an additional minimum interval (
bot_email_min_interval_sec, default60s) to avoid repeated email requests.
Security and policy hardening (Phase 2)
Adds stricter defaults for tool access and data handling:
Capability-oriented tool profiles:
gui,email,realtime,filesystemexplicit capability expressions are supported, for example:
capability:gui,emailcapability:gui+realtime
Snapshot path hardening:
camera_snapshotwrites only under workspacecamera_snapshots/.symlink escape paths are rejected.
Redaction-at-source:
private/local stream endpoints are redacted in outbound content.
sensitive metadata keys (for example
peer_id,account_id) are redacted before publish.
Runtime high-risk guard:
deny-by-default blocks risky multi-tool chains unless explicit intent is provided.
config toggle:
agents.defaults.strict_runtime_tool_guard(defaulttrue).
Example config:
{
"agents": {
"defaults": {
"strict_runtime_tool_guard": true
}
}
}
Explicit high-risk intent markers supported by policy/runtime guards:
intent:high-riskintent:high_riskallow:high-riskallow_high_riskunsafe:high-risk
Session memory and replay
Annolid agent sessions now keep separated memory layers and replayable event logs.
Working memory:
short-horizon session summary derived from recent user/assistant turns.
stored in session metadata as
working_memory.bounded by a character quota in
PersistentSessionStore.
Long-term memory:
stable facts/notes derived from session facts and consolidation updates.
stored in session metadata as
long_term_memory.bounded by a character quota in
PersistentSessionStore.
Deterministic consolidation and telemetry
Memory consolidation now uses deterministic triggers based on:
session turn counter (
turn_counter)next scheduled consolidation turn (
next_consolidation_turn)history length relative to memory window
Telemetry is persisted in session metadata as memory_telemetry with entries like:
timestampoutcome(for examplellm_consolidated,skipped_short_transcript,not_due)history_len,archive_len,keep_lenelapsed_ms
Memory mutation audit trail
Session metadata contains memory_audit_trail entries for memory changes, including:
timestampscope(facts,working_memory,long_term_memory)mutation(for exampleset_fact,set_working_memory)reasonturn_idbefore_chars/after_chars
Safe replay for debugging
Session event records are stored in metadata key event_log.
Each entry includes:
timestampdirection(inbound/outbound)kind(for exampleuser,assistant,progress,final)optional
turn_id,event_id,idempotency_keypayload
GUI/backend helpers:
replay_session_debug_events(session_store=..., session_id=..., direction=\"\", limit=200)format_replay_as_text(events)
These helpers are implemented in:
annolid/core/agent/gui_backend/session_io.py